Goto

Collaborating Authors

 human-generated text


Transforming Chatbot Text: A Sequence-to-Sequence Approach

Reddy, Natesh, Stamp, Mark

arXiv.org Artificial Intelligence

Due to advances in Large Language Models (LLMs) such as ChatGPT, the boundary between human-written text and AI-generated text has become blurred. Nevertheless, recent work has demonstrated that it is possible to reliably detect GPT-generated text. In this paper, we adopt a novel strategy to adversarially transform GPT-generated text using sequence-to-sequence (Seq2Seq) models, with the goal of making the text more human-like. We experiment with the Seq2Seq models T5-small and BART which serve to modify GPT-generated sentences to include linguistic, structural, and semantic components that may be more typical of human-authored text. Experiments show that classification models trained to distinguish GPT-generated text are significantly less accurate when tested on text that has been modified by these Seq2Seq models. However, after retraining classification models on data generated by our Seq2Seq technique, the models are able to distinguish the transformed GPT-generated text from human-generated text with high accuracy. This work adds to the accumulating knowledge of text transformation as a tool for both attack -- in the sense of defeating classification models -- and defense -- in the sense of improved classifiers -- thereby advancing our understanding of AI-generated text.


SMLT-MUGC: Small, Medium, and Large Texts -- Machine versus User-Generated Content Detection and Comparison

Rawal, Anjali, Wang, Hui, Zheng, Youjia, Lin, Yu-Hsuan, Sushmita, Shanu

arXiv.org Artificial Intelligence

Large language models (LLMs) have gained significant attention due to their ability to mimic human language. Identifying texts generated by LLMs is crucial for understanding their capabilities and mitigating potential consequences. This paper analyzes datasets of varying text lengths: small, medium, and large. We compare the performance of machine learning algorithms on four datasets: (1) small (tweets from Election, FIFA, and Game of Thrones), (2) medium (Wikipedia introductions and PubMed abstracts), and (3) large (OpenAI web text dataset). Our results indicate that LLMs with very large parameters (such as the XL-1542 variant of GPT2 with 1542 million parameters) were harder (74%) to detect using traditional machine learning methods. However, detecting texts of varying lengths from LLMs with smaller parameters (762 million or less) can be done with high accuracy (96% and above). We examine the characteristics of human and machine-generated texts across multiple dimensions, including linguistics, personality, sentiment, bias, and morality. Our findings indicate that machine-generated texts generally have higher readability and closely mimic human moral judgments but differ in personality traits. SVM and Voting Classifier (VC) models consistently achieve high performance across most datasets, while Decision Tree (DT) models show the lowest performance. Model performance drops when dealing with rephrased texts, particularly shorter texts like tweets. This study underscores the challenges and importance of detecting LLM-generated texts and suggests directions for future research to improve detection methods and understand the nuanced capabilities of LLMs.


Machine-Generated Text Detection using Deep Learning

Gaggar, Raghav, Bhagchandani, Ashish, Oza, Harsh

arXiv.org Artificial Intelligence

Our research focuses on the crucial challenge of discerning text produced by Large Language Models (LLMs) from human-generated text, which holds significance for various applications. With ongoing discussions about attaining a model with such functionality, we present supporting evidence regarding the feasibility of such models. We evaluated our models on multiple datasets, including Twitter Sentiment, Football Commentary, Project Gutenberg, PubMedQA, and SQuAD, confirming the efficacy of the enhanced detection approaches. These datasets were sampled with intricate constraints encompassing every possibility, laying the foundation for future research. We evaluate GPT-3.5-Turbo against various detectors such as SVM, RoBERTa-base, and RoBERTa-large. Based on the research findings, the results predominantly relied on the sequence length of the sentence.


Who Said That? Benchmarking Social Media AI Detection

Cui, Wanyun, Zhang, Linqiu, Wang, Qianle, Cai, Shuyang

arXiv.org Artificial Intelligence

AI-generated text has proliferated across various online platforms, offering both transformative prospects and posing significant risks related to misinformation and manipulation. It incorporates real AI-generate text from popular social media platforms like Zhihu and Quora. Unlike existing benchmarks, SAID deals with content that reflects the sophisticated strategies employed by real AI users on the Internet which may evade detection or gain visibility, providing a more realistic and challenging evaluation landscape. A notable finding of our study, based on the Zhihu dataset, reveals that annotators can distinguish between AI-generated and human-generated texts with an average accuracy rate of 96.5%. Furthermore, we present a new user-oriented AI-text detection challenge focusing on the practicality and effectiveness of identifying AI-generated text based on user information and multiple responses. The experimental results demonstrate that conducting detection tasks on actual social media platforms proves to be more challenging compared to traditional simulated AI-text detection, resulting in a decreased accuracy. On the other hand, user-oriented AI-generated text detection significantly improve the accuracy of detection. The advent of AI-generated text has had a profound impact on numerous sectors, including social media platforms. On one side, AI-generated responses enable automation, personalization, and scaling of content creation, thereby revolutionizing how information is disseminated and consumed. Addressing the abuse and malicious use of AI has led to significant research efforts in the field of AI-generated text detection. A range of approaches have been explored, including but not limited to machine learning algorithms Guo et al. (2023); Solaiman et al. (2019), text-based features analysis Mitchell et al. (2023); Tulchinskii et al. (2023); Mitchell et al. (2023), and positive unlabeled techniques Tian et al. (2023).


GPT as a Baseline for Recommendation Explanation Texts

Zhou, Joyce, Joachims, Thorsten

arXiv.org Artificial Intelligence

In this work, we establish a baseline potential for how modern model-generated text explanations of movie recommendations may help users, and explore what different components of these text explanations that users like or dislike, especially in contrast to existing human movie reviews. We found that participants gave no significantly different rankings between movies, nor did they give significantly different individual quality scores to reviews of movies that they had never seen before. However, participants did mark reviews as significantly better when they were movies they had seen before. We also explore specific aspects of movie review texts that participants marked as important for each quality. Overall, we establish that modern LLMs are a promising source of recommendation explanations, and we intend on further exploring personalizable text explanations in the future.


ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text

Mitrović, Sandra, Andreoletti, Davide, Ayoub, Omran

arXiv.org Artificial Intelligence

ChatGPT has the ability to generate grammatically flawless and seemingly-human replies to different types of questions from various domains. The number of its users and of its applications is growing at an unprecedented rate. Unfortunately, use and abuse come hand in hand. In this paper, we study whether a machine learning model can be effectively trained to accurately distinguish between original human and seemingly human (that is, ChatGPT-generated) text, especially when this text is short. Furthermore, we employ an explainable artificial intelligence framework to gain insight into the reasoning behind the model trained to differentiate between ChatGPT-generated and human-generated text. The goal is to analyze model's decisions and determine if any specific patterns or characteristics can be identified. Our study focuses on short online reviews, conducting two experiments comparing human-generated and ChatGPT-generated text. The first experiment involves ChatGPT text generated from custom queries, while the second experiment involves text generated by rephrasing original human-generated reviews. We fine-tune a Transformer-based model and use it to make predictions, which are then explained using SHAP. We compare our model with a perplexity score-based approach and find that disambiguation between human and ChatGPT-generated reviews is more challenging for the ML model when using rephrased text. However, our proposed approach still achieves an accuracy of 79%. Using explainability, we observe that ChatGPT's writing is polite, without specific details, using fancy and atypical vocabulary, impersonal, and typically it does not express feelings.


MIT researchers unveil new system to improve fake news detection

#artificialintelligence

Bad actors are increasingly using more advanced methods to generate fake news and fool readers into thinking they are legitimate. AI-based text generators, including OpenAI's GPT-2 model, which try and imitate human writers play a big part in this. To mitigate this, researchers have developed tools to detect artificially generated text. However, new research from MIT suggests there might be a fundamental flaw in the way these detectors work. TNW's finance, blockchain, and business event is coming up soon Traditionally, these tools trace back a text's writing style to determine if it's written by humans or a bot. They assume text written by humans is always legitimate and the text generated by bots is always fake.